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Abstract — Motivated by recent work on entanglement-assisted 
codes for sending messages over classical channels, the larger, 
easily characterised class of non-signalling codes is defined. 
Analysing the optimal performance of these codes yields an 
alternative proof of the finite block length converse of Polyanskiy, 
Poor and Verdii, and shows that they achieve this converse. 
This provides an explicit formulation of the converse as a linear 
program which has some useful features. For discrete memoryless 
channels, it is shown that non-signalling codes attain the channel 
capacity with zero error probability if and only if the dispersion 
of the channel is zero. 



I. Introduction 

A key goal of information theory is to quantify the extent to 
which reliable communication is possible over a noisy channel. 
A code of size M and block length n allows communication 
of one of M messages via n uses of the channel. The fun- 
damental tradeoff between these quantities and the reliability 
of communication, is captured by 1/^(5") - the largest size 
of code with error probability e (for equiprobable messages). 
While emphasis is often placed on quantifying asymptotics of 
the large n limit (by computing channel capacities or reliability 
functions, for example) but this information isn't necessarily 
useful if one wishes to compute a lower bound on the block 
length needed for a certain rate and error probability, for 
instance. 

While actually computing M^{£") is intractable, it is possi- 
ble to obtain lower (achievability) and upper (converse) bounds 
on it from which the asymptotic quantities derived, but which 
also give useful answers for questions concerning finite block 
lengths. In their recent paper 1 1 1, Polyanskiy, Poor and Verdu 
prove a very general converse bound (the 'PPV converse' for 
the purposes of this article) 
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where Mf^^{£'^) is given by a maximin optimisation of 
the reciprocal of the minimum type II error over a set of 
hypothesis tests. They go on to show how many existing 
converse results can be easily derived from theirs. 

Recent work has shown that it can be advantageous in 
classical coding over classical channels for the sender and 
receiver to share entangled quantum systems |l2l, [3], BJ, ||5l, 
IS). While the capacity cannot be increased, the number of 
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messages possible for a given error bound can be. Entangle- 
ment assistance can even increase the zero-error capacity |6|. 
This raises questions about the extent to which entanglement 
can assist in general. 

In an entanglement-assisted code, the output of the decoder 
is conditionally independent of the input to the encoder given 
the input to the decoder, and vice-versa. A non-signalling (NS) 
code is any code with this property, and Mf^{£^^) is largest 
size of NS code with error probability e. Any upper bound 
M^^{£") clearly applies to entanglement-assisted codes. 

From the elegant proof of the PPV converse fTl, it can be 
seen that it also applies to non-signalling codeo that is 
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This fact, combined with lower bounds on Mg(£"), provides 
quite stringent limits on the advantage from entanglement 
assistance. Section HIl precisely defines the concepts and quan- 
tities of interest, and recaps the proof of the PPV converse. 

Section |III] analyses performance of non-signalling codes 
directly, deriving a linear program for a quantity 
whose integer part [Af*(f")J is precisely M^^ {£"■). Clearly 
this quantity is an upper bound on M^(f") and, as mentioned, 
no larger than Alf^^ {£^^). Remarkably, it turns out that 
M*{£") is precisely equal to Mf^^ {£"■). This provides an 
alternative proof of the PPV converse (for discrete channels), 
which shows that it is achieved by NS codes, and provides 
primal and dual linear programs (LPs) for it, which are 
useful for computing the bound: The duality theorem for LPs 
means that any feasible point for the dual LP gives a valid 
converse bound, and can allow for certification of optimality. 
There is also an operationally intuitive way to use symmetry 
of the channel to reduce the size of the linear programs, 
from exponential to polynomial in n in the case of discrete 
memoryless channels (DMCs). 

Section |IV] shows that DMCs where non-signalling codes 
can attain the channel capacity with zero-error, are precisely 
those with zero channel dispersion, and thus also admit par- 
ticularly efficient classical codes. The final section concludes 
with some suggested directions for future research. 

II. Definitions and Background 

We consider a single use of a discrete channel with input 
alphabet A and output alphabet B. The channel input and 
output are random variables (RVs) X and Y, respectively. Our 

'My thanks to an anonymous referee for pointing this out. 
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description of the channel use E determines the probabilities 
£{y\x) :— Pr(F — y\X — x,£). A message W is selected 
from a set of M possible messages {1, 2, . . . , M} by a source 
S, which determines the probabilities S{w) :— Pr{W — w\S). 
A code Z consists of an encoder, which takes input W and 
whose output is the channel input X from A, and a decoder 
whose input is the channel output Y (in B) and which produces 
a decoding W of the message. The code Z determines the 
probabilities 

Z{x, w\w, y) Pt{X ~ x,W = w\W = w,Y = y, Z). 
An error has occurred if W ^ W. 

Remark 1. When considering n uses of a channel, the alpha- 
bets are A" and B" and the channel use is which gives the 
conditional probabilities of output strings x = xi . . . a;„ G A" 
given each input string y — yi ■ ■ ■ y-n & B". A discrete channel 
is fully described by specifying £ — {£" : n G N}. A 
discrete memoryless channel (DMC), is one where £"(y|x) = 
£:®"(y|x) := nr=i £{yMl for all n. 

In a classical code, the encoder and decoder are uncorre- 
cted, in the sense that 

Z{x,'w\w,y) = F{x\w)G{w\y) (4) 

for some conditional probability distributions F and G. This 
property defines the class NC of codes with No Correlation 
(in the absence of a channel) between the encoder and the 
decoder 

Definition 2. SE (Shared Entanglement) is the class of 
entanglement-assisted codes which can be implemented by 
local operations of the encoder and decoder on quantum 
systems ( with finite Hilbert spaces) in a shared entangled state. 

A positive operator valued measure (POVM) L for a Hilbert 
space %, and finite set of outcomes R, assigns positive 
(semidefinite) operators L{r) on % to the outcomes r € R 
such that X)reR -^(^) ~ ^' 'where I is the identity operator on 
£. A code Z is in SE iff there exist finite dimensional Hilbert 
spaces T-La ond Hb, POVMs for "Ha, with outcomes in 
A for w £ {1, . . . , Af}, POVMs Fy for Hb with outcomes 
in {1, . . . , M} far y G B, and a density operator pab on 
"Ha 'Si'Hb, such that 

Z{x,w\w,y) TtEiu{x) <Si Dy{w)pAB■ 
The class SE contains the class NC, and is itself contained 
in the class of non-signalling codes: 

Definition 3. A non- signalling (NS) code is any one for 

which the marginal distribution of the output of the decoder 
is conditionally independent of the input to the encoder given 
the input to the decoder, and vice-versa. That is, for all 
X e A,y e B,w,w e {1, . . . ,M}, 

Pt{W ^w\W = w,Y = y, Z) =Pt{W = w\Y ^y, Z), (5) 
PT{X = x\W^w,Y = y,Z) =Pt{X:^x\W^w,Z). (6) 

These conditions define the class NS of Non-Signalling as- 
sisted codes. 



From B ayes' rule and (|6]l, 

Z{x,w\w,y) 
= Pr{W ^w\W = w,Y = y,X^x,Z)p{x\w,Z). 

where p{x\w,Z) :— Pi{X ~ x\W = w,Z). This can be 
interpreted operationally as indicating that if ^ holds, then 
Z could be implemented by having the encoder stochastically 
generate X according to the value of W, and then send 
the values of X and W to the decoder (using additional 
communication) which would use these, in addition to Y , to 
determine how to generate W . Using (|7]i and the fact that 

Pi{Y = y,X^x\W^w,Z,£) ^ £{y\x)p{x\w, Z) (8) 

it is easy to show that 

Pt{W ^w,Y = y,X = x\W = w,£,Z,S) 
=Z{x,w\w,y)£{y\x). 

Proposition 4. The conditional probabilities © are clearly 
non-negative. To form a valid conditional distribution, they 
must also satisfy 

\iw: ^ Z{x,w\w,y)£{y\x) = 1. (10) 

w,x,y 

This is true for all channels £ if and only if Z is non-signalling 
from the receiver to the sender ( this is the condition expressed 

by my 

Proof ([Tol l is a straightforward consequence of ^ via 
(|7]i. For the other direction, if Z is signalling from Bob to 
Alice then there exist w' E {1, . . . , M}, x' E A and yo, yi G B 
such that J2^Z{x',w\w',yQ) > Z{x' ,w\w' ,yi). Choos- 
ing the channel £ with £{yo\x') = 1 and, for all x ^ x' , 
£{yi\x) = 1, 

^£{yo\x)Z{x,w\w' ,yo) > ^£{yo\x)Z{x,w\w' ,yi) 

(11) 

Since Vx : £{yQ\x) = 1 — £(yi\x), this implies that 

Zix,w\w',y)£{y\x)>l. (12) 

■ 

For the rest of this paper, the source is taken to be 
Sm, which assigns equal probability to each message: Vw : 
Sniw) = 1/M. 

Definition 5. For channel £, the minimum average probability 
of error which can be achieved by a code in class fl is 

e^{M,£) := mm{PT{W ^W\Z,£,Sm) ■ Z in fl} 

and the largest local code with error no larger than e has size 

Mf{£) := max{M : Pt:{W :^W\Z,£,Sm) < e,2 in n}. 

When the superscript i7 is omitted, it is intended that VL — NC. 

Remark 6. By the inclusions of the classes of codes, 

e{M,£)>e^^{M,£) >e^^{M,£), (13) 

and 

M^{£) < Mf^{£) < Mf^{£). (14) 
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Remark 7. Note that if £{y\x) = q{y) then, using Q, 



= 1 - ^ ^ Pr(W^ = w\Y = y)q{y) = 1 - 1/M. 



(15) 



e{M,£) or M^{£) are in general both hard to compute 
and to analyse. This motivates the desire for bounds on these 
quantities which are more amenable to computation and/or 
analysis. Many previously established converse results can 
derived from the following result of Polyanskiy, Poor and 
Verdu: 

Definition 8. For a finite set C, let V{C) denote the 
set of probability distributions on C. Given distributions 
p(o) p(i) £ (and identifying p(°) with the null 

hypothesis) let l3i-e{P^^\ P^^"^) denote the minimum type II 
error '^j,^cTrP^^\r) that can be achieved by statistical 
tests T which give a type I error no greater than e, i.e. 
i:recTrP^°^{r)>l~e. 

Theorem 9 (PPV converse - Theorem 27 of UJ). The number 
of messages which can be transmitted by an NS code with 
probability no greater than e obeys Mf^^{£) < Mf^^{£) 
where 

MfPV(£):= max ' ^ 



Px&V(A)QYeV{B) /3i_,(PxY,Fx X Qy) 



(16) 



with PxY{x,y) := Px{x)£{y\x) 



Proof: Define two hypotheses Hq and Hi to explain the 
data X = x,Y ~ y: In both Z is an NS code of size M, 
but in Hi the channel is £i. Let denote the error probability 
attained by the code if the channel is £i: 

1- ^Pt{W = W\Z,£,,Sm) ^Y.^-yPxYi^^y) (17) 
where 

P^x\.ix,y) ■.= Pt{X = x,Y = y\Z,£,,SM) (18) 
=£Mx)Pr{X ^x\Z,Sm) (19) 

and 

T,y:^PT{W^W\X^x,Y^y,Z,SM) (20) 
which, using dD and (|9]l, is 



T — 

J^xy — 



M 



Z{x,w\w,y) 



(21) 



"^y Mp(x\w,z) ■ 

The direct part of Proposition |4] guarantees that this is a vahd 
statistical test, which proves that 

/3i_,„(Pi''^,Pl^^)<l-ei. (22) 

Setting £o = £ and £i{y\x) = Qviy) (see Remark |7]i shows 
that, for any NS code with Pt{X = x\Z,Sm) = Px{x), e 
and M must satisfy 



max (3i_^{PxY , Px x Qy) < — , 

Qy iVi 



(23) 



and therefore. 



minmax/3i_ 

Px Qy 



eiPxY.Px X Qy) < 



(24) 



Definition 10. For codes in class fl, the e-error capacity of 
(£•) := lim -logMf(£:") (25) 

n— )-oo TL 



and the capacity is 



C'\£) := \imC\\£). 



(26) 



The fact that the PPV converse applies to NS codes has 
some immediate consequences: 

Remark 11. Since the information spectrum converse that 
Verdu and Han use to derive their general formula for channel 
capacity [[7] can be derived from the PPV converse, this 
formula also gives the capacity for NS codes. 

Remark 12. For DMCs, a converse derived from the PPV 
converse and an achievability bound for classical codes, can 
be used to prove fj^ the result of Strassen [8 ], 

log Af,(£-®") = nC- V^Q'\e) + 0{logn), (27) 

where C is the channel capacity, V is the channel dispersion 
(see Section\!Vi and Q{x) := {2Tr)-^/^ e-^^'^dt. 

Since the PPV converse also applies to NS codes, ( 1271 ) 
applies to these too, and the difference in the rates achieved 
by classical and NS codes (for fixed e) is only of order 
O {log n)/n. 

III. The performance of non-signalling codes 

The optimisation over codes that yields e^^ (M, £) in Defi- 
nition (|5]l is already a linear program (LP): The variable is the 
code Z (considered as a |A||B|A/^ dimensional real vector), 
the objective function Pr{W ^W\Z,£,Sm) is 



1 - X! ^iy\^)z(^^Mw,y), 



(28) 



and the constraints are simply the linear equalities comprising 
the non-signalling conditions Q and in addition to the 
non-negativity and normalisation of Z. 



W 



tt{W) 




d 


TT^W) 









Fig. 1. Operational interpretation of the code Z which results from the 
symmetrisation {29\ of a non-signalling code. The boxes marked 'e' and 'd' 
are the encoder and decoder for the original non-signalling code Z. The 
permutations are coordinated by a shared random variable tt drawn uniformly 
at random from the symmetric group on {1, . . . , A/}. 
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If Z is an NS code, then let 

Z(x, w\w, y) = X! 7r(«))|7r(u;), y) (29) 



TreG 



where G is the symmetric group on {1, ... , A/}, -Kiw) denotes 
the action of a permutation in G on w £ {!,..., M}. This 
symmetrized code Z{x,'w\w,y) has an operational interpre- 
tation given in Fig [T] from which it is clear that it is also 
non-signalUng and since 



Fv{W = W\Z,£,Sm) 



(30) 



= 1^ E E E ^(yi^)^(^. ^1^' y) (32) 

= Pr(W^ = VK|Z,f,5M), (33) 



the optimisation over NS codes for e^^(M, £), can be re- 
stricted to symmetrized codes. These are precisely those codes 
with the form 



Z{x, w\w, y) 



Rxy if w = w, 

Qxy \f W ^ W. 



(34) 



In these terms, the non-signalling condition (|6]l is equivalent 
to saying that there exists p : A — R such that Rxy + {M — 

l)Qxy = p{x), and so 



Z{x, w\w, y) 



Rxy 



if W ~ W, 



{p{x) - Rxy)/{M - 1) if w ^ w. 



(35) 

With this simplification, the conditional probabilities in Z are 
non-negative iff R^y > and p{x) > R^y for all a;, y, and 
the normalisation condition Vw, y : ^ Z{x,w\'w,y) = 1 is 
equivalent to J^xPi^) ~ ^- "^^^ condition (|5]l of no signalling 
from encoder to decoder is Vy : J2x ^^y — J2xiPi^) ^ 
Rxy)/{M — 1) which, in light of the normalisation condition, 
is equivalent to 



Vy ■■Y.^-y^ V^^- 



(36) 



Given a feasible point with, J^xeA^^y' for some 

y' e B, for any A e [0, 1], 



^/ = J ~ + ^^'(^) y = y'' (37) 

' ^ otherwise 



Rxy 



is also feasible and there must exist A s.t. X^xga ^^y' ~ 1/M. 
Since R'^y > Rxy for all x, y this can only be an improvement 
on the original point, so (|36] | can be changed to an inequality, 
to obtain 



Proposition 13. 

l-e^'^iM,£)=maxY^Y.£{y\x)Rxy (38) 

xeA yeB 

subject to (39) 

yyeB:J2Rxy< 1/M, (40) 

xeA 

yx€A,yeB:p{x)>Rxy, (41) 

^p(x) = 1, (42) 



anGA 



Va; e A,y e B ti?:,^ > 0,p^ > 0. 



(43) 



Introducing Lagrange multipliers D^y, Zy, a for the con- 
straints (ITTT l. ( l40l l. (|42] | respectively, the Lagrangian function 
is 

^ ^£(y|x)i?,, + Dxy{p{x) - i?.,) (44) 

xeAyeB, xSAsGB 



E^JxF-E 



(45) 

V xeA J 



E E ('^ (j^l^;) - - Zy) 



xeA xeB 



xeA \yeB 



yeB 



(46) 



(47) 



Taking the supremum over non-negative R and u and restrict- 
ing the multipliers to the region where it is finite yields the 
dual LP, whose solution is equal to that of the primal LP by 
the strong duality theorem for linear programming: 



eNS(M,£:) =max 1 - a - ^ ^ zj 
\ yeB / 

subject to 

Vx e A, y e B :f (y|a;) < D^y + Zy, 

Vx e A : Dxy < a, 
yeB 

Vx e A, y e B -.Dxy > 0. 



(48) 

(49) 
(50) 

(51) 
(52) 



Fixing z, one should pick D^y — ma.x{£{y\x) — %,0} and 
a ~ maxxeA 12yeB ^^y ^^'^^ objective function is 



yeB 



1 — max^^ max{£(y|a;) — Zj^, 0} — tt^^ ^y (53) 
xeA ^ — ' NL ^ — ' 

yeB 

{e{y\x)~Zy,Q})-\-Y^ 



- mm 

xeA 



E(^(y| 



X — max-^ c(y\x ~ z. 



yeB 



- mm 

xeA 



in^ min{f (ylx), - ^ Zj, 



M ^ 

yeB 

(54) 
(55) 



yeB yeB 
It remains to maximise over z: 

Proposition 14. The minimum error probability which can be 
attained by an NS code is 

e^^{M,£) = maxmin^(min{zj^,£(y|a;)} - Zy/M). 



yeB 
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Allowing M to take on real values in Proposition [T3] and 
defining /i := 1/M, it is evident that e^^{M,£) is a piece- 
wise linear, non-increasing, concave function of /i for /i e 
[0, 1]. What's more, this can be inverted to obtain a linear 
program which gives the smallest value of 1/M such that 
there exists an NS code of size [AfJ with error probability e 
for £. That is, M^^{£) = [^^^ (^)J where 



M*{£) ^ =min/i, 

subject to 
Vy e B : ^ R^y < fi, 



(56) 
(57) 
(58) 



xeA 



^^£:(2/|a;)i?,^ > 1-e, (59) 

and the constraints ( 14111431 ). (60) 

At this point, it is quite straightforward to show the claimed 
equivalence to the PPV converse: 



Proposition 15. 



(61) 



Proof: Writing out the optimisation that determines the 
PPV converse (Theorem |9]l explicitly (with the shorthands 
p{x) := Px{x), q{x) :— Qviv)), it is clear that the function 
being optimised is bilinear in T and q, both of which are con- 
strained to finite dimensional polytopes. Using von Neumann's 
minimax theorem ||9|, 

^ppv ^minmaxmin V Vr™p(x)g(y) (62) 

p q T ^ — ^ ^ — ^ 



xeA yeB 



= minminmax> > Txyp{x)q{y) (63) 

p T q ^ — ^ ^ — ^ 

= min min max Txyp{x) (64) 

subject to (65) 

^^5(2/|a;)p,r,,, > (66) 

xeA yeB 

^p(.T) = l,^q(2/) = l, (67) 

X y 

VxeA,yeB:0<T^y<l, (68) 

Vx,2/:p(x) >0,q(2/)>0. (69) 

Writing R^y = p{x)Txy, this linear program is equivalent to 



mm fi 
subject to 
Vy e B : ^ R^y < n, 



(70) 
(71) 
(72) 



xeA 



^^£(y|x)i?,^ > (73) 

xeA yeB 

Y^pi^) - 1, (74) 



xeA 



Vx € A,y e B :0 < R^y <p{x), 
which is exactly the primal LP for M*(£')~'^. 



(75) 



Since the maximisation of i under the constraints ( 1721175] ), 
which yields Mp{£) directly, is a linear- fractional program 
[lOJ , the Charnes-Cooper transformation [11 J 



Fxy := Rxy/^J., := p{x)/fi, t := l/fi, 



(76) 



can be used to transform it into a linear program for M*{£), 
from which t can be eliminated by using the transformed 
version of (|74] |. J^xeA '^^ ^ ^' obtain 

Tlieorem 16. M^^{£) = [M*{£)\, where 



M*{£) 



max Vx 
xeA 
subject to 

Vx e A, y G B -.Fxy < Vx, 



Vy e B 



xeA 



F < 1 

^ xy _ ^ ) 



(77) 

(78) 
(79) 
(80) 



xeA 



(81) 
(82) 



5^5]f(y|x)^^,,> (1-6 

xeA yeB 

V.TG A,y€ B -.Fxy >0,vx>0. 

Since the main goal is to obtain upper bounds on M, 
the dual of this linear program is more useful. Introducing 
Lagrange multipliers Vxy, Cy and ^ for the constraints ( l79l l. 
(ISOl l and dSTT l respectively, taking the infimum of the resulting 
Lagrangian over non-negative F and v, and restricting the 
multipliers to the finite region gives us the dual program: 

Theorem 17. M^^{£) = [M*i£)\, where 

M*{£) ^ min^Cy, 
yeB 
subject to 

Vx e A, y e B -.Vxy + > C£{y\x), 
VxeA ■ 



(83) 



■■Y.yxy<(x-t)^ 

yeB 



1, 



(84) 
(85) 
(86) 



Vx e A,y € B -.Vxy > 0,Cj^ > 



(87) 



At any feasible point of this dual LP, the value of the 
objective function is an upper bound on AfJ^^(£). 

A. The zero-error case. 

In |3|, it was shown that is given by a linear pro- 

gram which is determined by a combinatorial object associated 
with £, namely its hypergraph. This subsection recovers that 
result as a special case of the results developed here. First, 
some definitions: The hypergraph H{£) of £ has vertex set 
V{H) = A and hyperedges 

E{H{£)) := {ey := {x : £{y\x) > 0} : Vy G F} (88) 

capturing the equivocation of each output symbol y E B. (Note 
that since the set of hyperedges is defined by its members, 
these being subsets of A, the number of hyperedges may be 
less than the number of output symbols.) A fractional packing 
of a hypergraph H is an assignment of non-negative weights 
v{x) < 1 to all vertices x G V{H) such that 



Ve G E{H) : 



Vx < 1. 



(89) 
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A fractional covering of a hypergraph H is an assignment of 
non-negative weights c(e) < 1 to all hyperedges e e E{H) 
such that 



V a; e A : 



E 



Ce > 1. 



(90) 



(Restricting the weights to {0,1} recovers the combinatorial 
notions of packing and covering.) 

The, fractional packing number a* (H) is the maximum total 
weight allowed in a fractional packing of H and the fractional 
covering number lo* [H] is the minimum total weight required 
for a fractional covering of H. These are clearly dual linear 
programs, which for a channel hypergraph H{£) have the 
formulation 



a*{H{£)) ^ max 



co*{H{£)) 



\ : Vx G A,v{x) > 0, 

min < ^ ^ Cy : Vy £ B, Cy > 0, 
yee. 

Y,\£{y\x)]cy>l\, 



(note that (j/|a;)] is if £{y\x) = and is otherwise 1.) 

In ID it was shown that M^^{E) = [a*{H{£))\. Given 
Theorem [161 this is equivalent to 



Proposition 18. 

M*{£)=i^*{H{£))^a*{H{£)). 



(91) 



Proof: In the primal LP for M^{£) (Theorem [T6b. let Vx 
be any fractional packing of H{£), and let 



if £{y\x) > 0, 
otherwise. 



(92) 



Now, the constraints ( l79l l are trivially satisfied and the con- 
straint dSTl l is satisfied because ^,j:<£A^yGB^iy\^)^xy ~ 
T,xe/KT,yeB^iy\^)vx = Ex6A^^- For all y G B, 
Y.x(^A^^v = ^x:£{y\x)>Q^x which is Icss than or equal to 
one because is a fractional packing, so the constraints dSOl l 
are satisfied. Therefore, 



a*iHi£))<M*i£). 



(93) 



In the dual LP for M^{£) (Theorem [TtIi. let Cy he any 
fractional covering of H{£), choose the smallest C such that 



Vx,?/ : C£{y\x) > Cy, and let V^y 
Clearly the constraints 



nax{0,Cf (y|a;) ~ Cy}. 
are satisfied, and for all x E A, 

J2^^v^ E (Cf(yk)-c,)<C-i, (94) 

yGB y:e{y\x)>0 



(95) 



as required for ( |86l ). Therefore, 

M;i£)<Lo*{H{£)). 
Since uj*{H{£)) = a*{H{£)), the resuh follows. 



W 





9{X) 




e 







Y 



9{Y) 



W 



Fig. 2. Operational interpretation of the code Z wliicli results from the 
symmetrisation i97\ of a non-signalling code. The boxes marked 'e' and 'd' 
are the encoder and decoder for the original non-signalling code Z. The 
transformations of the channel input and output are coordinated by a shared 
random variable g drawn uniformly at random from the group G. 



B. Taking advantage of symmetry 

Let G be a group with an action on the input alphabet A and 
on the output alphabet B (inducing a joint action on A x B), 
such that 



Vg e G : £{g o y\g o x) = £{y\x). 
For any non-signalling code Z define the code 

Z{x,w\w,y) 



(96) 



T7r\^Z{g-x,w\w,g-y), (97) 
' ' see 

whose operational interpretation is given in Fig|2] and which is 
also non-signalling. The value of Z.{x,w\w,y) depends only 
on G{x, y), that is, the orbit of (a;, y) under the joint action 
of G, and since 

Ft{W^w\W^w,Z,£,Sm) (98) 

=i^EEE^(^''^i"''y)^(5°2/i^°^) ^^^^ 

' ' g<£G xeAyeB 



=mEEE^(5-^ 



o x, w\w, g ^ O 



y)£{y\x) (100) 



= Pt{W = w\W:^w,Z,£,Sm), (101) 

the optimisations for e^^{M,£) and M^^{£) in Definition |5] 
can be restricted to codes with this symmetry. 

Proposition (flJl l was obtained by showing that one can 
already to NS codes of the form 



Z{x,w\w,y) 




A/-1 



if w — w, 

if W w, 



(102) 



without increasing the optimal error probability. Applying the 
symmetrisation jW] ) to this expression, R^y and p{x) will only 
depend on G{x, y) and Gx, respectively. 

An example where symmetry can be used to great effect is 
where f " (with input alphabet is A" and output alphabet B") 
is invariant under the actions of the symmetric group 5" that 
permutes the symbols in the input and output strings. This is 
true for any DMC, for example. 

Following lfT2l . |fT3l . the joint type of a pair of sequences 
X = a;i . . . x„ € A" and y = j/i . . . y„ G B, is the distribution 
Px,y on A X B defined by nP^^ y{a,b) = A^(a, 6|x, y)where 
N{a, 6|x, y) is the number of values of i for which {xi,yi) = 
{a,b). VniA X B) denotes the set of all such joint types. 
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Likewise, the type of a sequence x e A" is the distribution Px 
on A with Af(a|x) = nPx{a) and Vn{l^) is the set of these. 
Given a joint type tab, the joint type class T"^^ is the set of 
all pairs of strings (x, y) with joint type tab- Similarly, for a 
type TA, T;^, := {x e A" : = ta}. 

As is well known, the orbits of A" x B" under the joint 
action of the symmetric group described above, are precisely 
the joint type classes T"^^, for each joint type in P„(A x B), 
and f "(y|x) is a function only of the joint type of x and y: 



£"(y|x)=f"(Px,y). 
For a DMC, and joint type tab e ^^(A x B), 



(103) 



(104) 



aGA,b6B 



Therefore, in the primal formulation of e'^^(Af, £") (Propo- 
sition [T3]l one can take R^y = R{Px,y) and p{x) = p{Px) for 
all X, y, and replace the sums over the input and output strings 
with sums over joint types (or types) which incorporate the 
correct multiplicity factors. 

The objective function in ( [38] l becomes 

E |T;^,Ji?(TAB)5(rAB), (105) 

-!-ABe-P„(AxB) 

where |T!^^J = ?^!/(^aeA,beB('^^AB(a, 6))!). Similai'ly, the 
normalisation of u (l42T l becomes 



o-G'Pn(A) 



(106) 



where |T"| = n\/ {Y[g^^f^{n,a{a))l). In (|40] i there is a constraint 
on a sum over A" for each output string in B" . The number of 
pairs (x, y) with joint type equal to tab for fixed y, depends 
only on Py, and is equal to 



n 



{nTB{b)y. 



Tn{T;Py) :- f,gB 




naeA("'rAB(a,&))! 



if TB = P, 



otherwise. 



(107) 

(Note that if the joint type of (x, y) is tab, then the marginal 
distribution ta is the type of x, and tb is the type of y.) 
Therefore, ( l40l l can be replaced by 

VcTB e P„(B) : E ™(7-AB; fTB)P(TAB) < 1/M. 

TABeP„(AxB) 

(108) 

The remaining constraints are equivalent to 



Vtab e P„(A X B) : < P(tab) < p{ta) 



(109) 



Since the number of (joint) types is polynomial in n lfT2l . 
|[T3l the number of variables and constraints in the simplified 
LP given above is polynomial in n, and this is also true of dual 
of this program. The linear programs derived for M^^(£'") 
can be simplified similarly. 



IV. Assisted zero-error capacities of discrete 

MEMORYLESS CHANNELS WITH ZERO DISPERSION 

As shown in |3|, for any DMC, it follows from Proposition 
[TS] and the multiplicitivity of the fractional packing number, 
that 



Cr(£)=loga(i/(5))* 



(110) 



A simulation C of size k for the channel use £ of size k 
consists of an encoder which takes an input X from A and 
produces a message J in {1,...,k}, and a decoder which 
takes a message J in {1, . . . , k} and produces an output Y 
from B. The £ determines the probabilities 

Cij,y\x,j) ■.= FT{J = j,Y = y\X = xJ = j,C). (Ill) 

We assume that the message is perfectly transmitted from the 
encoder to the decoder, so J = J. The simulation is exact if 

Pr(r = y\X = x,C) = E-CQ- y|x,j) =f (ylx). (112) 

A non-signalling (NS) simulation is one where 

PT{Y = y\X^x,J^lC) =Pr(r-y|J = j,/:), (113) 
PviJ=j\X = xJ=lC) =PT{J = j\X = x,C). (114) 

Kq^{£) denotes the minimum size of an exact NS simulation 
of £, and 



1 



lim -log^;,^^(f") 



(115) 



is the (asymptotic) exact simulation cost of S. In |3| it was 
shown that, for any DMC, 



=log> 'max£:(2/|a;). 



^-^ xGA 



(116) 



In what follows, £ is omitted as an argument, since it refers 
to some fixed channel. For any discrete channel. 



NS 



(117) 



From now on, let She a DMC with = Proposition 
26 of |3 1 shows that, given any requirement on which transition 
probabilities in £ must be zero, it is possible to find an £ that 
satisfies that requirement and has all three quantities in jllH 
equal. In (6\ it was shown that there are DMCs where even the 
entanglement-assisted zero-error capacity Cq ^ reaches C (and 
with a block length one entanglement-assisted code) despite 
the unassisted zero-error capacity Cq being strictly smaller 

For a DMC, V is the minimum variance of the information 
density of the channel for the joint distribution induced by any 
capacity achieving input distribution for a single channel use. 
The information density is 

£iy\x) 



i{x;y) = log- 



liy) 



(118) 



where q £ V{B) is the output distribution. The capacity is the 
expectation of the information density. Therefore, y = iff 
there exists a capacity achieving input distribution p £ P(A) 
(with induced output distribution q) such that (II 18b is equal 
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to C when £{y\x)p{x) is non-zero, i.e. if and only if, for x Proof: Let q{y) — X]xeAP(^)^(yl^)- 

s.t. p{x) > .^-^ 

\oga*{£,p) = -ma.x\og2_^\£{y\x)']p{x) (127) 

£iy\x) = \£iy\x)^q{y)2^'. (119) xeA 

If V is zero, then the y/n term vanishes in the asymptotic — *?(?/) log ri£(2/|a:)]p(x) (128) 
expansion. In this sense, a channel with zero dispersion admits 

qualitatively more efficient codes (in terms of approaching = - £{y\x)p{x) log \£{y\x')']p{x') 

capacity with increasing block length) than a channel with yeSxeA x' 
positive variance does. It turns out that a channel has zero 

(129) 

dispersion if and only if its capacity can be achieved with Subtracting I{£,p) from this last expression one obtains 

zero-eii-or by NS codes. ^ £{y\x")p{x") 

Theorem 19. For a DMC £ the three conditions X] X] ^^y\^)P^^) log £{y\x)Y,^,\£{y\x'y\p{x') ""^^^^ 

1) C^^{£)^C[£), ''^^'^^ 

2) K^^{£) — C{£) which is never larger than zero because, using logx < [x — 

3) V{£) = 0. ' l)/(lii2). 

""""""""" 

Proof: The following propositions show that (3) implies x:£{y\x)>o 
(1) and (2); that (1) implies (3); and that (2) impHes (3). ■ 

(131) 

Proposition 20. IfV^O then C^^ and K^^ are both equal £{y\x)p{x) ( J2 " ^{yW)p{x") 

Proof: We show that if V"(5) = then the opposite (132) 



(133) 



inequalities to those in ( II 171 ) also hold. Using dl 191 



^log^maxf (y|x) (120) " Ie.' [^(yNOlMx') A. I ^ (2^1^^ 1^^) 



yeB 



^ log y maxff g(y)2^ <C. (121) " E ^(^I^M^) ) /(l^ 2) 

=0. (134) 



For the other part, when q{y) is non-zero 

■ 

^ \£{y\x)'\p{x)2^ = ^ -y^p(x) = 1, (122) Proposition 23. // Cq^^ ^ q f/jg„ V ^Q. 

Proof: Suppose that £{y\x) is a channel with Cq^ = 
and when ^(y) is zero we must have \£{y\x)']p{x) = for C. Let w : A ^ [0, 1] be any optimal fractional packing 
all X e A and for the channel and let a* be the fractional packing number. 

El£( I )1 f (123) preceding lemma, p{x) = w{x)/a* defines a capacity 

I \V\x> \P\X) ■ \ ^ achieving input probability mass function for the channel. Let 

A 

q be the corresponding output probability mass function. It 
Therefore p(x)2'^ is a fractional packing, and C^^ > C. ■ was shown in |14| that if p is capacity achieving then 

Definition 21. , m, n | = C when > 0, 

D{£{-\x)\\q){ ^ ^ , ^ ' (135) 



:=max{a : Vy ^rf(y|a;)lap(x) < 1}, (124) [ < C when - 0. 

x Since, C = log a* by assumption, these conditions imply that, 

which is equivalent to ^'^^ a; € A, 

\ a25) 0<loga*-i^(f(.|x)||g)=^£:(y|a;)log^^ (136) 

Clearly the fractional packing number is given by a* {£) = "^'"8 x<{x-l)/ (In 2) again, 

max„ a* where the maximum is over probability distri- \r^/ m f liu)'^* \ 

butions p on the input alphabet. ^ < ^^£(2/|x)r£: (^^^ - ij (137) 

Lemma 22. Let I{£,p) denote the mutual information be- ^'S^ Q*q{y)\£{y\xy\ —'S^£{y\x) (138) 
tween channel input and output when the input has probability 

mass function p. ^-^ 

= J2a*qiy)\£{y\x)]-l. (139) 

I{£,p)>\oga*{£,p) (126) yeB 
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Therefore, Vy := a*q{y) is a fractional covering for the 
channel hypergraph, and it is optimal. Furthermore, the com- 
plementary slackness condition demands that when p{x) > 
the corresponding inequality must be saturated. Therefore, 
when £{y\x)p{x) > we must have "^^fy^^-^ — 1 = or 
log £{y\x)/q{y) — loga* so the variance of the information 
density is zero for this capacity achieving distribution. ■ 

Proposition 24. // K^^ = C then F = 0. 

Proof: Let p be a capacity achieving probability mass 
function. 

c.i:g.(.i.w.,iosJt)_ ,„o) 



< 



a;eA j/GB 



„o,gg,^.(.i.-) £:;;gtg) (142, 



= log Vmaxf (yla;) 



--K, 



NS 



(143) 



(144) 



For equality to hold, Jensen's inequality ( I141l i must be satu- 
rated. This happens if and only if 



S{y\x) 



^ G 



(145) 



for all x,y such that £{y\x)p{x) > 0, which is equivalent to 
V = 0. m 
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V. Conclusion 

It was shown that maximum size of non-signalling code 
with a given error probability is given by the integer part 
of the solution to a linear program, and that this is equal to 
the converse bound of Polyanskiy, Poor and Verdii [IJ, thus 
giving an alternative proof of that result. When n uses of the 
channel are symmetric under simultaneous permutations of the 
input and output strings, the LP can be simplified to one with 
poly(7i) variables and constraints. 

It was also proven that the capacity of a DMC is achieved 
with zero-error by NS codes, if and only if the channel has zero 
dispersion, and therefore already admits especially efficient 
classical codes. 

It would be interesting to see if the dual linear programming 
formulation of the converse given in this paper can help in 
extending the finite block length results given in [l]. The 
technique of using non-signalling assistance to obtain linear 
program converses for classical coding protocols extends nat- 
urally to multi-terminal situations like broadcast or multiple 
access channels, and may prove useful in this context. 
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